智能论文笔记

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

Rendi Chevi , Radityo Eko Prasojo , Alham Fikri Aji , Andros Tjandra , Sakriani Sakti

分类：自然语言处理 | 机器学习 | 神经与进化计算

2022-03-29

Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specifically, we offer module-wise distillation, enabling flexible and independent distillation to the encoder and decoder module. The resulting Nix-TTS inherited the advantageous properties of being non-autoregressive and end-to-end from the teacher, yet significantly smaller in size, with only 5.23M parameters or up to 89.34% reduction of the teacher model; it also achieves over 3.04x and 8.36x inference speedup on Intel-i7 CPU and Raspberry Pi 3B respectively and still retains a fair voice naturalness and intelligibility compared to the teacher model. We provide pretrained models and audio samples of Nix-TTS.

translated by 谷歌翻译

我们从任务特定的BERT基教师模型执行知识蒸馏（KD）基准到各种学生模型：Bilstm，CNN，Bert-Tiny，Bert-Mini和Bert-small。我们的实验涉及在两个任务中分组的12个数据集：印度尼西亚语言中的文本分类和序列标记。我们还比较蒸馏的各个方面，包括使用Word Embeddings和未标记的数据增强的使用。我们的实验表明，尽管基于变压器的模型的普及程度不断上升，但是使用Bilstm和CNN学生模型，与修剪的BERT模型相比，使用Bilstm和CNN学生模型提供了性能和计算资源（CPU，RAM和存储）之间的最佳权衡。我们进一步提出了一些快速胜利，通过涉及涉及丢失功能，Word Embeddings和未标记的数据准备的简单选择的高效KD培训机制来生产小型NLP模型。

translated by 谷歌翻译